Model Selection

Structured Data Extraction

# Structured Data Extraction

Visionocr 3B 061125 GGUF

A visual OCR model fine-tuned based on Qwen2.5-VL-3B-Instruct, focusing on document-level OCR, long-context visual language understanding, and mathematical LaTeX format conversion.

Transformers English

Qwen2.5 VL 7B Instruct GGUF

Qwen2.5-VL is the latest vision-language model from the Qwen family, featuring powerful visual understanding and multimodal processing capabilities, supporting image and video analysis with structured output.

Image-to-Text English

Qwen2.5 VL 3B Instruct GGUF

Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring powerful visual understanding and multimodal processing capabilities.

Image-to-Text English

Docscopeocr 7B 050425 Exp

docscopeOCR-7B-050425-exp is a model fine-tuned based on Qwen/Qwen2.5-VL-7B-Instruct, focusing on document-level OCR, long-context visual language understanding, and accurate image-to-text conversion of mathematical LaTeX formats.

Transformers Supports Multiple Languages

Qwen2.5 VL Instruct 3B Geo

Qwen2.5-VL is the latest vision-language model in the Qwen family, focusing on enhanced visual understanding and agent capabilities.

Transformers English

Qwen2.5 VL 72B Instruct AWQ Fix

Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring powerful visual understanding and agent capabilities, supporting multi-format visual localization and structured output generation.

Transformers English

Qwen2.5 VL 72B Instruct AWQ

Qwen2.5-VL is a multimodal large language model launched by the QwenLM team, featuring powerful visual understanding and intelligent agent capabilities, supporting various input formats including images, videos, and text.

Transformers English

Qwen2.5 VL 72B Instruct Pointer AWQ

Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring enhanced visual understanding, agent capabilities, and structured output generation.

Transformers English

Qwen2.5 VL 7B Instruct AWQ

Qwen2.5-VL is a multimodal vision-language model launched by Tongyi Qianwen, featuring powerful image understanding and text generation capabilities.

Transformers English

Qwen2.5 VL 3B Instruct 4bit

Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring enhanced visual understanding, agent capabilities, and long video processing.

Transformers English

A financial table question answering model based on the LayoutLM architecture, specifically designed to extract and answer structured questions from financial tables.

Question Answering System

Transformers English

Output LayoutLMv3 V7

A document understanding model fine-tuned based on microsoft/layoutlmv3-base, excelling in document layout analysis tasks

Text Recognition

Table Transformer Detection Custom Ale

A table detection model based on DETR architecture, specifically designed to identify table regions in documents

Text Recognition

This model is a fine-tuned version of microsoft/layoutlmv2-base-uncased on the generator dataset, suitable for document understanding and layout analysis tasks.

Large Language Model

Donut Receipt V2

A model fine-tuned based on naver-clova-ix/donut-base, potentially used for receipt recognition or document understanding tasks

Large Language Model

CORD-v2 is a model for image-to-text tasks, primarily used for extracting and recognizing text content from images.

Text Recognition

Document image understanding model fine-tuned based on naver-clova-ix/donut-base-finetuned-cord-v2

Donut Base Finetuned Cord V2

Donut is a visual document understanding model based on Swin Transformer, specifically fine-tuned for the CORD dataset, capable of extracting structured text information from images.

Table Detection

A table detection model based on DETR architecture, specifically designed to identify and extract tables from unstructured documents

Object Detection

Donut Base Sroie

A model fine-tuned on an image folder dataset based on naver-clova-ix/donut-base, with no specific use case explicitly stated

Text Recognition

A model fine-tuned based on naver-clova-ix/donut-base, specific uses and functions require more information

Donut Base Receipt V3

Receipt recognition model fine-tuned based on naver-clova-ix/donut-base

Large Language Model

A model fine-tuned based on philschmid/donut-base-sroie, suitable for image processing tasks

Text Recognition

Yolov8n Table Extraction

A table detection model based on YOLOv8, capable of identifying table regions in documents, supporting both bordered and borderless table types.

Object Detection

Donut Base Sroie

This model is a fine-tuned version of naver-clova-ix/donut-base on an image folder dataset, suitable for document understanding tasks.

Text Recognition

Donut Base Sroie

A document understanding model fine-tuned based on philschmid/donut-base-sroie

Text Recognition

Donut Base Medical Handwritten Blocks Data Extraction

A model based on the Donut architecture, specifically designed for extracting structured data from medical handwritten documents

Text Recognition

DETR Table Detection

Table Transformer is a table detection model based on the DETR architecture, specifically designed to detect and recognize table structures from document images.

Text Recognition

Transformers English

Donut Base Sroie

A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image text extraction tasks

Text Recognition

Layoutlmv3 Finetuned Cord

A document understanding model fine-tuned on the CORD dataset based on LayoutLMv3, excelling in document token classification tasks

Text Recognition

Layoutlmv2 Finetuned Sroie Mod

A document understanding model fine-tuned from microsoft/layoutlmv2-base-uncased, suitable for structured document information extraction tasks

Large Language Model

Theivaprakasham

Layoutlmv2 Finetuned Sroie

A document information extraction model fine-tuned on the SROIE dataset based on the LayoutLMv2 architecture, excelling at extracting key fields from receipt documents

Sequence Labeling

Theivaprakasham

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase